MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 01-Dec-96 20:18:01 GMT
Content-Type: text/html
Content-Length: 1457
Last-Modified: Wednesday, 28-Feb-96 20:31:20 GMT

<html>
<head><title>Kristen Summers -- Document Structure Classification</title></head>

<body>
<h1>Near-Wordless Document Structure Classification</h1>

<p>In <em>Proceedings of the International Conference on Document
Analysis and Recognition</em> (ICDAR '95), pp. 462 - 465, Montr&eacute;al,
August 1995.</p>

<hr>

<p><strong>Abstract</strong><br>
Automatic derivation of logical document structure from generic layout
would enable a multiplicity of electronic document manipulation
tools of a type that is becoming crucial to users who wish
to browse the internet.
This problem can be divided into segmentation (dividing the text
into a hierarchy of pieces) and classification (categorizing
these pieces as particular logical structures.)
This paper proposes an approach to the classification of
logical document structures, according to their
distance from prototypes that are primarily geometric.  The
prototypes consider linguistic information minimally,
thus relying minimally on the accuracy of OCR and 
decreasing language-dependence.  Different classes of logical
structures and the differences in the
requisite information for classifying them are presented.
A prototype format is proposed,
existing prototypes and a distance measurement are described, and 
performance results are provided.</p>

<hr>

You can view the <!WA0><!WA0><!WA0><!WA0><a href="http://www.cs.cornell.edu/Info/People/summers/Papers/classify.ps">full postscript file</a>
or return to <!WA1><!WA1><!WA1><!WA1><a href="http://www.cs.cornell.edu/Info/People/summers/summers.html">my home page</a>.
</body>
</html>
